Housing Prices (Kaggle) by Aparna Radhakrishnan

##   Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape
## 1  1         60       RL          65    8450   Pave  <NA>      Reg
## 2  2         20       RL          80    9600   Pave  <NA>      Reg
## 3  3         60       RL          68   11250   Pave  <NA>      IR1
## 4  4         70       RL          60    9550   Pave  <NA>      IR1
## 5  5         60       RL          84   14260   Pave  <NA>      IR1
## 6  6         50       RL          85   14115   Pave  <NA>      IR1
##   LandContour Utilities LotConfig LandSlope Neighborhood Condition1
## 1         Lvl    AllPub    Inside       Gtl      CollgCr       Norm
## 2         Lvl    AllPub       FR2       Gtl      Veenker      Feedr
## 3         Lvl    AllPub    Inside       Gtl      CollgCr       Norm
## 4         Lvl    AllPub    Corner       Gtl      Crawfor       Norm
## 5         Lvl    AllPub       FR2       Gtl      NoRidge       Norm
## 6         Lvl    AllPub    Inside       Gtl      Mitchel       Norm
##   Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt
## 1       Norm     1Fam     2Story           7           5      2003
## 2       Norm     1Fam     1Story           6           8      1976
## 3       Norm     1Fam     2Story           7           5      2001
## 4       Norm     1Fam     2Story           7           5      1915
## 5       Norm     1Fam     2Story           8           5      2000
## 6       Norm     1Fam     1.5Fin           5           5      1993
##   YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType
## 1         2003     Gable  CompShg     VinylSd     VinylSd    BrkFace
## 2         1976     Gable  CompShg     MetalSd     MetalSd       None
## 3         2002     Gable  CompShg     VinylSd     VinylSd    BrkFace
## 4         1970     Gable  CompShg     Wd Sdng     Wd Shng       None
## 5         2000     Gable  CompShg     VinylSd     VinylSd    BrkFace
## 6         1995     Gable  CompShg     VinylSd     VinylSd       None
##   MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure
## 1        196        Gd        TA      PConc       Gd       TA           No
## 2          0        TA        TA     CBlock       Gd       TA           Gd
## 3        162        Gd        TA      PConc       Gd       TA           Mn
## 4          0        TA        TA     BrkTil       TA       Gd           No
## 5        350        Gd        TA      PConc       Gd       TA           Av
## 6          0        TA        TA       Wood       Gd       TA           No
##   BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF
## 1          GLQ        706          Unf          0       150         856
## 2          ALQ        978          Unf          0       284        1262
## 3          GLQ        486          Unf          0       434         920
## 4          ALQ        216          Unf          0       540         756
## 5          GLQ        655          Unf          0       490        1145
## 6          GLQ        732          Unf          0        64         796
##   Heating HeatingQC CentralAir Electrical X1stFlrSF X2ndFlrSF LowQualFinSF
## 1    GasA        Ex          Y      SBrkr       856       854            0
## 2    GasA        Ex          Y      SBrkr      1262         0            0
## 3    GasA        Ex          Y      SBrkr       920       866            0
## 4    GasA        Gd          Y      SBrkr       961       756            0
## 5    GasA        Ex          Y      SBrkr      1145      1053            0
## 6    GasA        Ex          Y      SBrkr       796       566            0
##   GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr
## 1      1710            1            0        2        1            3
## 2      1262            0            1        2        0            3
## 3      1786            1            0        2        1            3
## 4      1717            1            0        1        0            3
## 5      2198            1            0        2        1            4
## 6      1362            1            0        1        1            1
##   KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu
## 1            1          Gd            8        Typ          0        <NA>
## 2            1          TA            6        Typ          1          TA
## 3            1          Gd            6        Typ          1          TA
## 4            1          Gd            7        Typ          1          Gd
## 5            1          Gd            9        Typ          1          TA
## 6            1          TA            5        Typ          0        <NA>
##   GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual
## 1     Attchd        2003          RFn          2        548         TA
## 2     Attchd        1976          RFn          2        460         TA
## 3     Attchd        2001          RFn          2        608         TA
## 4     Detchd        1998          Unf          3        642         TA
## 5     Attchd        2000          RFn          3        836         TA
## 6     Attchd        1993          Unf          2        480         TA
##   GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch X3SsnPorch
## 1         TA          Y          0          61             0          0
## 2         TA          Y        298           0             0          0
## 3         TA          Y          0          42             0          0
## 4         TA          Y          0          35           272          0
## 5         TA          Y        192          84             0          0
## 6         TA          Y         40          30             0        320
##   ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold
## 1           0        0   <NA>  <NA>        <NA>       0      2   2008
## 2           0        0   <NA>  <NA>        <NA>       0      5   2007
## 3           0        0   <NA>  <NA>        <NA>       0      9   2008
## 4           0        0   <NA>  <NA>        <NA>       0      2   2006
## 5           0        0   <NA>  <NA>        <NA>       0     12   2008
## 6           0        0   <NA> MnPrv        Shed     700     10   2009
##   SaleType SaleCondition SalePrice
## 1       WD        Normal    208500
## 2       WD        Normal    181500
## 3       WD        Normal    223500
## 4       WD       Abnorml    140000
## 5       WD        Normal    250000
## 6       WD        Normal    143000

This is the USA Housing dataset - training data - that was downloaded from the Kaggle website (https://www.kaggle.com/gpandi007/usa-housing-dataset). The data contains different sale prices for houses in USA.

## [1] 1460   81

The data contained 79 attributes (other than id and sale price) for 1460 houses.

##        Id           MSSubClass       MSZoning     LotFrontage    
##  Min.   :   1.0   Min.   : 20.0   C (all):  10   Min.   : 21.00  
##  1st Qu.: 365.8   1st Qu.: 20.0   FV     :  65   1st Qu.: 59.00  
##  Median : 730.5   Median : 50.0   RH     :  16   Median : 69.00  
##  Mean   : 730.5   Mean   : 56.9   RL     :1151   Mean   : 70.05  
##  3rd Qu.:1095.2   3rd Qu.: 70.0   RM     : 218   3rd Qu.: 80.00  
##  Max.   :1460.0   Max.   :190.0                  Max.   :313.00  
##                                                  NA's   :259     
##     LotArea        Street      Alley      LotShape  LandContour
##  Min.   :  1300   Grvl:   6   Grvl:  50   IR1:484   Bnk:  63   
##  1st Qu.:  7554   Pave:1454   Pave:  41   IR2: 41   HLS:  50   
##  Median :  9478               NA's:1369   IR3: 10   Low:  36   
##  Mean   : 10517                           Reg:925   Lvl:1311   
##  3rd Qu.: 11602                                                
##  Max.   :215245                                                
##                                                                
##   Utilities      LotConfig    LandSlope   Neighborhood   Condition1  
##  AllPub:1459   Corner : 263   Gtl:1382   NAmes  :225   Norm   :1260  
##  NoSeWa:   1   CulDSac:  94   Mod:  65   CollgCr:150   Feedr  :  81  
##                FR2    :  47   Sev:  13   OldTown:113   Artery :  48  
##                FR3    :   4              Edwards:100   RRAn   :  26  
##                Inside :1052              Somerst: 86   PosN   :  19  
##                                          Gilbert: 79   RRAe   :  11  
##                                          (Other):707   (Other):  15  
##    Condition2     BldgType      HouseStyle   OverallQual    
##  Norm   :1445   1Fam  :1220   1Story :726   Min.   : 1.000  
##  Feedr  :   6   2fmCon:  31   2Story :445   1st Qu.: 5.000  
##  Artery :   2   Duplex:  52   1.5Fin :154   Median : 6.000  
##  PosN   :   2   Twnhs :  43   SLvl   : 65   Mean   : 6.099  
##  RRNn   :   2   TwnhsE: 114   SFoyer : 37   3rd Qu.: 7.000  
##  PosA   :   1                 1.5Unf : 14   Max.   :10.000  
##  (Other):   2                 (Other): 19                   
##   OverallCond      YearBuilt     YearRemodAdd    RoofStyle   
##  Min.   :1.000   Min.   :1872   Min.   :1950   Flat   :  13  
##  1st Qu.:5.000   1st Qu.:1954   1st Qu.:1967   Gable  :1141  
##  Median :5.000   Median :1973   Median :1994   Gambrel:  11  
##  Mean   :5.575   Mean   :1971   Mean   :1985   Hip    : 286  
##  3rd Qu.:6.000   3rd Qu.:2000   3rd Qu.:2004   Mansard:   7  
##  Max.   :9.000   Max.   :2010   Max.   :2010   Shed   :   2  
##                                                              
##     RoofMatl     Exterior1st   Exterior2nd    MasVnrType    MasVnrArea    
##  CompShg:1434   VinylSd:515   VinylSd:504   BrkCmn : 15   Min.   :   0.0  
##  Tar&Grv:  11   HdBoard:222   MetalSd:214   BrkFace:445   1st Qu.:   0.0  
##  WdShngl:   6   MetalSd:220   HdBoard:207   None   :864   Median :   0.0  
##  WdShake:   5   Wd Sdng:206   Wd Sdng:197   Stone  :128   Mean   : 103.7  
##  ClyTile:   1   Plywood:108   Plywood:142   NA's   :  8   3rd Qu.: 166.0  
##  Membran:   1   CemntBd: 61   CmentBd: 60                 Max.   :1600.0  
##  (Other):   2   (Other):128   (Other):136                 NA's   :8       
##  ExterQual ExterCond  Foundation  BsmtQual   BsmtCond    BsmtExposure
##  Ex: 52    Ex:   3   BrkTil:146   Ex  :121   Fa  :  45   Av  :221    
##  Fa: 14    Fa:  28   CBlock:634   Fa  : 35   Gd  :  65   Gd  :134    
##  Gd:488    Gd: 146   PConc :647   Gd  :618   Po  :   2   Mn  :114    
##  TA:906    Po:   1   Slab  : 24   TA  :649   TA  :1311   No  :953    
##            TA:1282   Stone :  6   NA's: 37   NA's:  37   NA's: 38    
##                      Wood  :  3                                      
##                                                                      
##  BsmtFinType1   BsmtFinSF1     BsmtFinType2   BsmtFinSF2     
##  ALQ :220     Min.   :   0.0   ALQ :  19    Min.   :   0.00  
##  BLQ :148     1st Qu.:   0.0   BLQ :  33    1st Qu.:   0.00  
##  GLQ :418     Median : 383.5   GLQ :  14    Median :   0.00  
##  LwQ : 74     Mean   : 443.6   LwQ :  46    Mean   :  46.55  
##  Rec :133     3rd Qu.: 712.2   Rec :  54    3rd Qu.:   0.00  
##  Unf :430     Max.   :5644.0   Unf :1256    Max.   :1474.00  
##  NA's: 37                      NA's:  38                     
##    BsmtUnfSF       TotalBsmtSF      Heating     HeatingQC CentralAir
##  Min.   :   0.0   Min.   :   0.0   Floor:   1   Ex:741    N:  95    
##  1st Qu.: 223.0   1st Qu.: 795.8   GasA :1428   Fa: 49    Y:1365    
##  Median : 477.5   Median : 991.5   GasW :  18   Gd:241              
##  Mean   : 567.2   Mean   :1057.4   Grav :   7   Po:  1              
##  3rd Qu.: 808.0   3rd Qu.:1298.2   OthW :   2   TA:428              
##  Max.   :2336.0   Max.   :6110.0   Wall :   4                       
##                                                                     
##  Electrical     X1stFlrSF      X2ndFlrSF     LowQualFinSF    
##  FuseA:  94   Min.   : 334   Min.   :   0   Min.   :  0.000  
##  FuseF:  27   1st Qu.: 882   1st Qu.:   0   1st Qu.:  0.000  
##  FuseP:   3   Median :1087   Median :   0   Median :  0.000  
##  Mix  :   1   Mean   :1163   Mean   : 347   Mean   :  5.845  
##  SBrkr:1334   3rd Qu.:1391   3rd Qu.: 728   3rd Qu.:  0.000  
##  NA's :   1   Max.   :4692   Max.   :2065   Max.   :572.000  
##                                                              
##    GrLivArea     BsmtFullBath     BsmtHalfBath        FullBath    
##  Min.   : 334   Min.   :0.0000   Min.   :0.00000   Min.   :0.000  
##  1st Qu.:1130   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:1.000  
##  Median :1464   Median :0.0000   Median :0.00000   Median :2.000  
##  Mean   :1515   Mean   :0.4253   Mean   :0.05753   Mean   :1.565  
##  3rd Qu.:1777   3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.:2.000  
##  Max.   :5642   Max.   :3.0000   Max.   :2.00000   Max.   :3.000  
##                                                                   
##     HalfBath       BedroomAbvGr    KitchenAbvGr   KitchenQual
##  Min.   :0.0000   Min.   :0.000   Min.   :0.000   Ex:100     
##  1st Qu.:0.0000   1st Qu.:2.000   1st Qu.:1.000   Fa: 39     
##  Median :0.0000   Median :3.000   Median :1.000   Gd:586     
##  Mean   :0.3829   Mean   :2.866   Mean   :1.047   TA:735     
##  3rd Qu.:1.0000   3rd Qu.:3.000   3rd Qu.:1.000              
##  Max.   :2.0000   Max.   :8.000   Max.   :3.000              
##                                                              
##   TotRmsAbvGrd    Functional    Fireplaces    FireplaceQu   GarageType 
##  Min.   : 2.000   Maj1:  14   Min.   :0.000   Ex  : 24    2Types :  6  
##  1st Qu.: 5.000   Maj2:   5   1st Qu.:0.000   Fa  : 33    Attchd :870  
##  Median : 6.000   Min1:  31   Median :1.000   Gd  :380    Basment: 19  
##  Mean   : 6.518   Min2:  34   Mean   :0.613   Po  : 20    BuiltIn: 88  
##  3rd Qu.: 7.000   Mod :  15   3rd Qu.:1.000   TA  :313    CarPort:  9  
##  Max.   :14.000   Sev :   1   Max.   :3.000   NA's:690    Detchd :387  
##                   Typ :1360                               NA's   : 81  
##   GarageYrBlt   GarageFinish   GarageCars      GarageArea     GarageQual 
##  Min.   :1900   Fin :352     Min.   :0.000   Min.   :   0.0   Ex  :   3  
##  1st Qu.:1961   RFn :422     1st Qu.:1.000   1st Qu.: 334.5   Fa  :  48  
##  Median :1980   Unf :605     Median :2.000   Median : 480.0   Gd  :  14  
##  Mean   :1979   NA's: 81     Mean   :1.767   Mean   : 473.0   Po  :   3  
##  3rd Qu.:2002                3rd Qu.:2.000   3rd Qu.: 576.0   TA  :1311  
##  Max.   :2010                Max.   :4.000   Max.   :1418.0   NA's:  81  
##  NA's   :81                                                              
##  GarageCond  PavedDrive   WoodDeckSF      OpenPorchSF     EnclosedPorch   
##  Ex  :   2   N:  90     Min.   :  0.00   Min.   :  0.00   Min.   :  0.00  
##  Fa  :  35   P:  30     1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.00  
##  Gd  :   9   Y:1340     Median :  0.00   Median : 25.00   Median :  0.00  
##  Po  :   7              Mean   : 94.24   Mean   : 46.66   Mean   : 21.95  
##  TA  :1326              3rd Qu.:168.00   3rd Qu.: 68.00   3rd Qu.:  0.00  
##  NA's:  81              Max.   :857.00   Max.   :547.00   Max.   :552.00  
##                                                                           
##    X3SsnPorch      ScreenPorch        PoolArea        PoolQC    
##  Min.   :  0.00   Min.   :  0.00   Min.   :  0.000   Ex  :   2  
##  1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.000   Fa  :   2  
##  Median :  0.00   Median :  0.00   Median :  0.000   Gd  :   3  
##  Mean   :  3.41   Mean   : 15.06   Mean   :  2.759   NA's:1453  
##  3rd Qu.:  0.00   3rd Qu.:  0.00   3rd Qu.:  0.000              
##  Max.   :508.00   Max.   :480.00   Max.   :738.000              
##                                                                 
##    Fence      MiscFeature    MiscVal             MoSold      
##  GdPrv:  59   Gar2:   2   Min.   :    0.00   Min.   : 1.000  
##  GdWo :  54   Othr:   2   1st Qu.:    0.00   1st Qu.: 5.000  
##  MnPrv: 157   Shed:  49   Median :    0.00   Median : 6.000  
##  MnWw :  11   TenC:   1   Mean   :   43.49   Mean   : 6.322  
##  NA's :1179   NA's:1406   3rd Qu.:    0.00   3rd Qu.: 8.000  
##                           Max.   :15500.00   Max.   :12.000  
##                                                              
##      YrSold        SaleType    SaleCondition    SalePrice     
##  Min.   :2006   WD     :1267   Abnorml: 101   Min.   : 34900  
##  1st Qu.:2007   New    : 122   AdjLand:   4   1st Qu.:129975  
##  Median :2008   COD    :  43   Alloca :  12   Median :163000  
##  Mean   :2008   ConLD  :   9   Family :  20   Mean   :180921  
##  3rd Qu.:2009   ConLI  :   5   Normal :1198   3rd Qu.:214000  
##  Max.   :2010   ConLw  :   5   Partial: 125   Max.   :755000  
##                 (Other):   9

Univariate Plots Section

Distribution of Sale Prices

A look at the distribution of Sale Prices.

The distribution is right skewed and it seemed appropriate to carry out a log transformation. The next plot looks at the distribution of the log transformed prices.

The log transformed distribution follows an almost normal distribution. Thus for further analyses, I decided to use the log-transformed sale prices.

MSSubClass and MSZoning

MSSubClass defines the type of dwelling and MSZoning identifies the general zoning in the sale.

The classes for the MSSubClass is available in the annex. MSSubClass is a combination of YearBuilt, HouseStyle and BldgType. Since it is a combination, I will be ignoring this variable for further analyses.

The MSZoning plot shows that most of the houses sold were of Residential Low Density zoning. Thus, I won’t be considering this variable either for further analyses.

Plots for data on Lot variables - LotFrontage, LotArea, LotShape & LotConfig

LotFrontage is the linear feet of street connected to property. LotArea is the size of the lot sold in square feet. LotShape describes the shape of the lot in terms of regularity. LotConfig gives the configuration of the lot sold.

The LotFrontage plot shows a right tailed distribution which is similar to sale price distribution. This requires further analyses.

The LotArea plot shows a very long tailed distribution but doesn’t really explain much. I decided to create a new variable categorising the data in the next section.

As expected most of the houses had a lot shape were regular. Just to explore the effect of the shape, I will be considering this for further analyses.

Most of the properties sold were on the inside lot. Again to explore the lot configuration, I will be considering this for further analyses.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1300    7554    9478   10517   11602  215245

This plot looks more interesting and there could be a relationship of sale price with this variable.

Street and Alley data for the property sold.

Street gives type of road access and Alley describes the type of alley access to the property.

As most houses have a Paved Street and don’t have an alley, these variables will not be considered for further analyses.

Properties of the Land - LandContour & LandSlope

LandContour describes the flatness of the property and LandSlope dercibes the slope.

Most of the houses were on near flat or level land and most properties had a gentle slope. So I will not be considering these for further analyses.

Utilities and Conditions near the property

Utilities describe the types available and Conditions describe proximity to various conditions.

All these variables will be ignored from further analyses as there is not any variation in the data. Further description of the different conditions can be found in the annex.

Neighborhood

Neighborhood describes the physical locations within Ames city limits

It will be interesting to see the effect of the different neighborhoods in terms of age and sale price. I hypothesize that the newer neighborhoods would have a higher sale price in comparison to the older neighborhoods.

BldgType & HouseStyle

BldgType describes the type of dwelling while HouseStyle describes the style.

Most of the houses in this dataset were single family homes. Also to note there were two different categories of townhouses. Although there is not much variation, it will be good to check the effect of the building type on sale price.

The houses were mostly single storey followed by 2 storeyes. I will be ignoring this variable in further analyses.

OverallQual & OverallCond

OverallQual rates the overall material and finish of the house and OverallCond rates the overall condition.

Most of the houses had an Overall Quality of 5 and 50% of the houses were in the range of 5 to 7. While for Overall Condition, most houses were 5. I will be looking at the effect of both these variables with SalePrice.

YearBuilt & YearRemodAdd

YearBuilt is the year the house was built and YearRemodAdd is the year any remodelling or additions happened. If no renovations were done the year would be same for both the variables.

One would expect the older houses to cost less but since the data looks mainly at 2000-2005 built homes and the older homes are likely to have been remodelled, this variable may not be of much significance.

If the houses that were built in the early years were remodelled, that might need to be factored in.

RoofStyle & RoofMatl

These variables look at the roof style & material.

## [1] "Roof Material"
## ClyTile CompShg Membran   Metal    Roll Tar&Grv WdShake WdShngl 
##       1    1434       1       1       1      11       5       6
## [1] "Roof Style"
##    Flat   Gable Gambrel     Hip Mansard    Shed 
##      13    1141      11     286       7       2

I will ignoring these variables as there is not much variation in the results.

Exterior1st & Exterior2nd

These variables categorises the exterior coverings of the houses sold.

These plots shows that the exterior covering is mainly vinyl siding. As there is some variation in the values, I will be checking it’s effect with sale price.

MasVnrType & MasVnrArea

These variables describe the masonry veneer type and area.

Since most of the houses had no masonry veneer, I shall be ignoring both these variables.

ExterQual & ExterCond

The quality and condition of the exterior for the houses are described in these variables.

Although the variation is limited, I would like to see their relationship with sale price and overall quality and condition.

Foundation

Foundation describes the type of foundation of the houses sold.

I wonder if any of these foundations increased the sale price significantly?

BsmtQual & BsmtCond

BsmtQual and BsmtCond describes the quality and condition of the basement (if available) for the houses sold.

I shall keep this variable to see if there’s any effect on sale price.

BsmtExposure

BsmtExposure refers to walkout or garden level walls.

I will be ignoring this variable as there is not much variation.

BsmtFinType1 & BsmtFinSF1

Ratings and square footage of Basement Finished Area are described below.

BsmtFinType2 & BsmtFinSF2

BsmtUnfSF

The unfinished square footage of basement.

I will be ignoring the square footage as there is a variable with total square footage of the basement which I would like to consider.

I wonder if having living quarters or a rec room in the basement increases the sale price. Therefore I am converting the Finished Basement Types to a new variable called FinBsmt. Also to check if have living quarters over Rec room made a difference to the cost, I added a new variable, LivOrRec.

I think it would be nice to see the effect of having a living quarters or recreation room on the price.

TotalBsmtSF

Heating & HeatingQC

These variable talk about the type of heating and the quality and condition of the heating.

Although there is not much variation in the type of heating, I shall keep it to see if there’s a change in type dependent on the Year Remodelled. HeatingQC might have an effect on the price of the houses especially if they are sold in the winter months.

CentralAir & Electrical

CentralAir tells whether the house has central air conditioning. Electrical describe the type of electrical system installed.

I wonder the effect of central air conditioning on price and the electrical system on the year remodelled.

X1stFlrSF & X2ndFlrSF

These variables give the square footage of the first and second floors.

LowQualFinSF

LowQualFinSF gives the Low quality finished square feet (all floors).

The above plot does not show much information as most houses had 0 sq ft that was low quality finished. Thus, in the next plot I subsetted the data to those with more than 0 sq ft and looked at the distribution in the form of a box plot.

Since the next variable, GrLivArea is the sum of the above variables (X1stFlrSF, X2ndFlrSF and LowQualFinSF), I will not be considering these variables for the rest of the analyses.

GrLivArea (Square Footage)

Since square footage labelled as “GrLivArea” is probably one of the important factors for the pricing, I plotted its distribution.

Looking at the distribution, I decided to categorize the variable by binning them in to categories and create a new variable. To understand the distribution, I looked at the summary of the data and then categorised them.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     334    1130    1464    1515    1777    5642

Looking at the summary, I decided to bin the square footage in to 5 categories:

  • <1100
  • 1100-1500
  • 1500-2000
  • 2000-3000
  • >3000

BsmtFullBath & FullBath

These variables give the number of full baths above grade in basement and otherwise.

HalfBath & BsmtHalfBath

These variables give the number of half baths above grade in basement and otherwise.

I would like to see the effect of the number of baths (full / half) on price but with a new variable “TotalBaths”.

BedroomAbvGr & TotRmsAbvGrd

BedroomAbvGr and TotRmsAbvGrd describe number of bedrooms above grade (does NOT include basement bedrooms) & total rooms above grade (does not include bathrooms).

Just to see the effect of the number of bedrooms and total rooms above grade on sale price I shall take it to the next analyses.

KitchenAbvGr & KitchenQual

These variables describe the number of kitchens above grade and quality of kitchen.

Does having more than one kitchen or quality of kitchen have an effect on price?

Functional

Functional describes the level of Functionality of the house.

I shall ignore this variable as most houses have typical functionality.

Fireplaces & FireplaceQu

These variables describe the number of fireplaces and their quality.

Similar to heating, I wonder if having a fireplace increased the sale price in winter months. I shall be ignoring the fireplace quality.

GarageType

This variable describes the type of garage.

Most of the Garage Type was Attached (Attchd) followed by Detached (Detchd). Will the type of garage affect sale price?

GarageYrBlt & GarageFinish

These variables describe the garage was built and its finish.

It will be interesting to compare the year the garage was built with the year the house was remodelled and the finish of the garage with the sale price.

GarageCars & GarageArea

The number of cars in the garage and the area of the garage are described by GarageCars and GarageArea respectively.

Number of cars in the garage and area of the garage could be deciding factors for the sale of a house, I shall be exploring these further.

GarageQual & GarageCond

The plots for these two variables reveal mostly “Typical/Average” and therefore I am not considering them for further analyses.

PavedDrive & WoodDeckSF

Since the variation in these variables are minimal, they will not be considered for further analyses.

OpenPorchSF, EnclosedPorch, X3SsnPorch & ScreenPorch

Since the porches are not part of all the houses, I have plotted only houses that have square footage greater than 0.

The above features will not be considered for further analyses.

PoolArea & PoolQC

## [1] "Number of houses with pools =  7"

Since only 7 houses have a pool, I am not considering these variables.

Fence

## [1] "Percentage of houses with fence= 19.2465753424658"

Less than 20% of the houses have a fence, thus not considering this feature for further analyses.

MiscFeature & MiscVal

Since miscellaneous features are not present in most houses sold, MiscVal>0 and MiscFeature!=None only considered

The above variables will not be considered for further analyses.

MoSold & YrSold

Most sales seem to have happened during the summer months of June and July as expected.

The data is incomplete for 2010. Since the number of houses sold between 2006 and 2009 were similar, this variable should account for the market fluctuation if any in the real estate market and interesting to explore.

I note that this data may not be real, as in 2008 the real estate is known to have crashed and did not really gain before 2010. This does not show up in the graphs for the number of houses sold during that period. I wonder if it will be obvious when compared to the sale prices during that time.

Mo/Year Sold

This plot clearly shows that the 2010 data was only upto beginning of July. Another insight is that June & July are the most popular months for buying. I wonder if the sale prices had any trends during the year.

SaleType

This variable doesn’t have much variation and thus will be ignored.

SaleCondition

The Sale Condition as expected was Normal for most of the houses and thus shall be ignored.


Univariate Analysis

What is the structure of your dataset?

There are 1,460 houses in the dataset with 81 features. I focused on 39 features (5 of which were derived). The OverallQual and OverallCond variables can be considered as ordered factor variables with the 1 being the worst and 10 being the best. Other observations: * The median sale price is $163,000 and the maximum price is $755,000. Looking at the distribution, I log-transformed the sale price. * The lot area variable was categorized and most houses had a lot area between 5000 and 10000 square feet. The distribution of lot frontage was right tailed. Some lots were irregular in shape, while most of the configuration of the property were inside lots. * North Ames neighbourhood had the most number of sales during the period. * Most of the houses sold were single family homes. Townhomes were either end or inside unit. * Most of the houses had an Overall Quality of 5 and 50% of the houses were in the range of 5 to 7. The Overall Condition of the houses had a peak at 5. Most sales seem to have happened during the summer months of June and July. The data is incomplete for 2010. I also note that this data may not be real, as in 2008 the real estate is known to have crashed and did not really gain before 2010. This does not show up in the graphs for the number of houses sold during that period. I wonder if it will be obvious when compared to the sale prices during that time The data looks mainly at 2000 to 2005 built houses. The square footage of the houses sold were binned in to 5 different categories. Most of the houses sold were in the 1500 to 2000 square feet range. The number of cars in the garage that could be accommodated varied from 0 to 4 cars and the area of the garage from 0 to almost 1500 square feet. I also note the garages might have been built on a later date and may not be completely finished. There is not much variation in the garage type but whether it was attached or detached could affect sale prices. Interestingly some houses had more than one kitchen and their qualities varied. The number of bedrooms above grade ranged between 0 to 8, while the total number of rooms above grade varied from 2 to 14. A new variable with the total number of bathrooms were computed and ranged from 1 to 6. Most houses had central air conditioning but it would be interesting its effect on sale price. The electrical system and the type of heating varied probably based on the year remodeled. Since the heating quality varied, I wonder if it affects the sale price especially if sold in the winter months. Two new variables were compiled from the basement variables. FinBsmt records whether the basement was finished or not and LivOrRec variable records whether the basement is living quarters or a recreation room or both. The total basement square footage ranged from 0 to 6000 square feet. There was some variability in basement quality and condition. The top two main foundation of the houses sold were poured concrete and cinder block. Exterior Quality and Condition were mostly typical / average. Most houses had vinyl siding but could also have had wood siding, cement board, metal siding or plywood.

What is/are the main feature(s) of interest in your dataset?

The main features in the data set are square footage and sale price. I wonder which other features will contribute for predicting the sale price of the house. I believe it will be a combination of the features that can be used to build a predictive model to sale price of houses.

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

I think square footage, month sold, year sold, neighborhood, year remodelled, overall quality, overall condition, garage type and building type will help determine the sale price of the house sold.

Did you create any new variables from existing variables in the dataset?

I created a few new variables from existing variables - 1) TotalBaths - The total number of baths. 2) LotAreaCat - categorised lot area 3) LivAreaCat - categorised square footage 4) FinBsmt - basement finish 5) LivOrRec - Living Quaters or Recreation room in the basement.

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

I found the sale price had a skewed distribution and to tidy it I carried out a log transformation. The sale price data now follows a normal distribution. I also subsetted the data to only include the features of interest and the unique identifier for each house sold.

Bivariate Plots Section

Correlations with SalePrice

In an effort to look at the effect of the variables of interest (which had an inherent logical order) on Sale Price, I did 3 matrix plots as below. I used the correlations of values greater than 0.6 to be considered further.

plot 1

In this plot the most intersting correlations were bewteen log10(SalePrice) and other features such as OverallQual(0.82), TotalBaths(0.67), LotAreaCat(0.69) and LivAreaCat(0.69). LivAreaCat and LotAreaCat had a correlation of 1 but this is not suprising. I had expected some correlation between OverallQual and OverallCond but this was not the case. Similarly, the correlation between SalePrice and YearBuilt or YearRemodAdd were only 0.59 and 0.57 respectvely and a correlation of 0.59 between them.

plot 2

In this plot, there is a correlation of 0.68, 0.65 and 0.61 for ExterQual, BsmtQual and TotBsmtSqft respectively with log10(SalePrice). Also to note there was a correlation of 0.64 between ExterQual and BsmtQual.Interstingly, the correlation between TotRmsAbvGrd and log10(SalePrice) was negligible.

plot 3

In this plot, there is a correlation of 0.67, 0.68 and 0.65 for KitchenQual, GarageCars and GarageArea with log10(SalePrice). Also to note there is an expected high correlation of 0.88 between GarageCars and GarageArea. Interestingly, the TotRmsAbvGrade has only a correlation of 0.53 with log10(SalePrice). It also seems that the month and year for the sale has no effect on price. The GarageYrBlt (though minimal) seem to have had some influence on GarageFinish (0.53), GarageCars (0.59) and GarageArea (0.56).

Follow-up of intervariable

Since most of the quality features had a high correlation with SalePrice, I thought it was important to look at the correlations between these variables.

This plot shows that all the quality variables are corrlated at a correlation coefficient > 0.65. This makes me wonder if OverallQual explain most of the variation in SalePrice.

Similarly, I decided to look at the features that look at the area for the houses sold.

This plots shows that not all the ‘area’ features are as correlated as the ‘quality’ features except LotAreaCat and LivAreaCat that we had noted before.

The last one I wanted to look at was the correlation between TotalBaths, GarageCars, TotRmsAbvGrade, LivAreaCat.

I had expected a higher coorelation than 0.46 between TotalBaths and TotRmsAbvGrade. There is a high corrlation between LivAreaCat and TotRmsAbvGrade as expected.

other categorical variables

For the other categorical variables I looked at boxplots ordered on median sale price for each category of each variable.

In the above boxplots, each feature was sorted for median SalePrice of each category of the feature and plotted. The median SalePrice was also plotted as a red dashed line.

The features, LivOrRec, LotShape, LotConfig, BldgType, Exterior1st, Exterior2nd, Foundation, Heating and Electrical seem to have no significant effect on SalePrice. Thus I won’t be considering this further.

As expected, Neighborhood has an effect on SalePrice. Also to note, houses with Builtin & Attached Garages had higher SalePrice compared to the rest.

A closer look at the highest correlation (0.82)

I decided to extract the coefficients of a linear model of LogSalePrice ~ OverallQual and plot it again.

## 
## Call:
## lm(formula = LogSalePrice ~ OverallQual, data = housing_interest)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.46396 -0.05634  0.00569  0.05790  0.40145 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4.596765   0.011842  388.18   <2e-16 ***
## OverallQual 0.102506   0.001893   54.14   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1 on 1458 degrees of freedom
## Multiple R-squared:  0.6678, Adjusted R-squared:  0.6676 
## F-statistic:  2931 on 1 and 1458 DF,  p-value: < 2.2e-16

Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

The highest correlation with log10(sale price) of 0.82 was with OverallQual. This was followed by ExterQual (0.68), LivAreaCat (0.69), LotAreaCat (0.69), KitchenQual (0.67), BsmtQual (0.65), GarageCars (0.68), TotalBaths (0.67) , GarageArea (0.65) and TotBsmtSqft (0.61). But it was important to note that they are not all are independent and have some correlation between them like LivAreaCat and LotAreaCat are completely correlated with a correlation coefficient of 1. Neighborhood has an upward trend on SalePrice. Also to note, houses with Builtin & Attached Garages had higher SalePrice compared to the rest.

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

There was no correlation between OverallCond and OverallQual. All the quality variables are correlated with a correlation coefficient > 0.65. This makes me wonder if OverallQual explain most of the variation in SalePrice. The GarageYrBlt (though minimal) seem to have had some influence on GarageFinish (0.51), GarageCars (0.59) and GarageArea (0.56). The correlation between BedroomAbvGr and SalePrice was negligible. The month and year the houses were sold had no effect on the SalePrice.

What was the strongest relationship you found?

The strongest relationship for SalePrice was with OverallQual of the houses.

Multivariate Plots Section

In this section, I wanted to explore at least a few of the effects of the various features together and a few others.

Month/Year of sale

This plot shows that the median SalePrice was not affected by the Year or Month of Sale. This is surprising as it is known that

OverallQual, SalePrice and LivAreaCat

To begin exploring the total living area and overall quality with sale price, I plotted these three features.

This plot clearly shows an increase in LogSalePrice with Overall Quality and Living Area Caetgory. It also shows that smaller the area of the house, the lower the quality as well.

I created the next plot in an effort to look at whether there was any effect of remodelling on overall quality, sale price for each neighborhood.

In this plot, it is obvious that the sale price / square foot was not affected by remodelling. The effect of the overall quality and sale price / square foot in each neighborhood can be clearly seen in this plot.

In the next plot, I looked at the effect of Sale Price per Square foot of garage area, depending on garage type and coloured based on the garage finish.

In this plot, there is no correlation between Sale Price and area, type of Garage or Finish of Garage. However there seems to a pattern that most detached garages are unfinished.

In the next plot, I wanted to look at the relationships between Overall Quality to the other qualities.

The plot shows the correlations between the quality variables. The overall quality was higher as expected when all External, Kitchen and Basement Quality were towards Excellent.

In this plot I looked at the sale price in terms of overall quality, total baths and total rooms above grade. There is a clear increase in quality, total baths and rooms above grade with the sale price with exceptions.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

The living area also determined quality and sale price. This was obvious in the first plot that looked at overall quality versus sale price for each living area category and in the last plot that considered the total baths and total rooms above grade.

The neighborhood had an effect on the sale price but it didn’t matter if the house was remodelled or not. The neighborhood Northridge Heights (NridgHt) had the highest sale price /square foot but also the best quality houses. While at the other end was the South & West of Iowa State University (SWISU) neighborhood with low quality houses at the lowest sale price / square foot.

The correlations between the quality variables could easily been seen in the second last plot.

Were there any interesting or surprising interactions between features?

The garage variables did not seem to have much effect on the sale price.

Final Plots and Summary

Plot One

Description One

The distribution of the sale price of the houses sold on log scale appears to be almost normal. The price ranges from $34,900 to $755,000.

Plot Two

Description Two

The feature “OverallQual” had the highest Pearson correlation of 0.82 with Sale Price in log scale. This plot show the line of linear regression model between Overall Quality and Sale Price.

Plot Three

Description Three

The effect of Neighborhood and Overall Quality on Sale Price/Square Foot is summarised well in this plot. In addition, I observed that there was no effect on SalePrice according to whether the houses were remodelled or not.

Reflection

This housing data set had 1460 houses and 81 features including “id”. I began by exploring all the 80 features (excluding Id). My main feature of interest was Sale Price. The data set looks at houses sold between 2006 and 2010 in Iowa.

I struggled with the number of features and filtering them. I was extremely surprised that the SalePrice was not affected depending on the Year it was sold. It is known that the US housing market went through a major recession in 2008[1]. Thus, the dataset could be fictional.

However, with the data I looked at, Overall Quality of the house, the living area square footage and the neighborhood of the houses were major driving factors for the sale price. This resonates with all the main ingredients that a buyer would consider.

I didn’t create a linear model because the dataset had only 1460 houses information and too many features to consider. Also due to the fact that many of them were correlated.

This dataset could be further explored for all the intercorrelations between the features (which I just touched upon) and how much they actually affect the Sale Price.

[1] https://en.wikipedia.org/wiki/Timeline_of_the_United_States_housing_bubble

Appendix

MSSubClass: Identifies the type of dwelling involved in the sale.

    20  1-STORY 1946 & NEWER ALL STYLES
    30  1-STORY 1945 & OLDER
    40  1-STORY W/FINISHED ATTIC ALL AGES
    45  1-1/2 STORY - UNFINISHED ALL AGES
    50  1-1/2 STORY FINISHED ALL AGES
    60  2-STORY 1946 & NEWER
    70  2-STORY 1945 & OLDER
    75  2-1/2 STORY ALL AGES
    80  SPLIT OR MULTI-LEVEL
    85  SPLIT FOYER
    90  DUPLEX - ALL STYLES AND AGES
   120  1-STORY PUD (Planned Unit Development) - 1946 & NEWER
   150  1-1/2 STORY PUD - ALL AGES
   160  2-STORY PUD - 1946 & NEWER
   180  PUD - MULTILEVEL - INCL SPLIT LEV/FOYER
   190  2 FAMILY CONVERSION - ALL STYLES AND AGES

MSZoning: Identifies the general zoning classification of the sale.

   A    Agriculture
   C    Commercial
   FV   Floating Village Residential
   I    Industrial
   RH   Residential High Density
   RL   Residential Low Density
   RP   Residential Low Density Park 
   RM   Residential Medium Density

LotFrontage: Linear feet of street connected to property

LotArea: Lot size in square feet

Street: Type of road access to property

   Grvl Gravel  
   Pave Paved
    

Alley: Type of alley access to property

   Grvl Gravel
   Pave Paved
   NA   No alley access
    

LotShape: General shape of property

   Reg  Regular 
   IR1  Slightly irregular
   IR2  Moderately Irregular
   IR3  Irregular
   

LandContour: Flatness of the property

   Lvl  Near Flat/Level 
   Bnk  Banked - Quick and significant rise from street grade to building
   HLS  Hillside - Significant slope from side to side
   Low  Depression
    

Utilities: Type of utilities available

   AllPub   All public Utilities (E,G,W,& S)    
   NoSewr   Electricity, Gas, and Water (Septic Tank)
   NoSeWa   Electricity and Gas Only
   ELO  Electricity only    

LotConfig: Lot configuration

   Inside   Inside lot
   Corner   Corner lot
   CulDSac  Cul-de-sac
   FR2  Frontage on 2 sides of property
   FR3  Frontage on 3 sides of property

LandSlope: Slope of property

   Gtl  Gentle slope
   Mod  Moderate Slope  
   Sev  Severe Slope

Neighborhood: Physical locations within Ames city limits

   Blmngtn  Bloomington Heights
   Blueste  Bluestem
   BrDale   Briardale
   BrkSide  Brookside
   ClearCr  Clear Creek
   CollgCr  College Creek
   Crawfor  Crawford
   Edwards  Edwards
   Gilbert  Gilbert
   IDOTRR   Iowa DOT and Rail Road
   MeadowV  Meadow Village
   Mitchel  Mitchell
   Names    North Ames
   NoRidge  Northridge
   NPkVill  Northpark Villa
   NridgHt  Northridge Heights
   NWAmes   Northwest Ames
   OldTown  Old Town
   SWISU    South & West of Iowa State University
   Sawyer   Sawyer
   SawyerW  Sawyer West
   Somerst  Somerset
   StoneBr  Stone Brook
   Timber   Timberland
   Veenker  Veenker
        

Condition1: Proximity to various conditions

   Artery   Adjacent to arterial street
   Feedr    Adjacent to feeder street   
   Norm Normal  
   RRNn Within 200' of North-South Railroad
   RRAn Adjacent to North-South Railroad
   PosN Near positive off-site feature--park, greenbelt, etc.
   PosA Adjacent to postive off-site feature
   RRNe Within 200' of East-West Railroad
   RRAe Adjacent to East-West Railroad

Condition2: Proximity to various conditions (if more than one is present)

   Artery   Adjacent to arterial street
   Feedr    Adjacent to feeder street   
   Norm Normal  
   RRNn Within 200' of North-South Railroad
   RRAn Adjacent to North-South Railroad
   PosN Near positive off-site feature--park, greenbelt, etc.
   PosA Adjacent to postive off-site feature
   RRNe Within 200' of East-West Railroad
   RRAe Adjacent to East-West Railroad

BldgType: Type of dwelling

   1Fam Single-family Detached  
   2FmCon   Two-family Conversion; originally built as one-family dwelling
   Duplx    Duplex
   TwnhsE   Townhouse End Unit
   TwnhsI   Townhouse Inside Unit

HouseStyle: Style of dwelling

   1Story   One story
   1.5Fin   One and one-half story: 2nd level finished
   1.5Unf   One and one-half story: 2nd level unfinished
   2Story   Two story
   2.5Fin   Two and one-half story: 2nd level finished
   2.5Unf   Two and one-half story: 2nd level unfinished
   SFoyer   Split Foyer
   SLvl Split Level

OverallQual: Rates the overall material and finish of the house

   10   Very Excellent
   9    Excellent
   8    Very Good
   7    Good
   6    Above Average
   5    Average
   4    Below Average
   3    Fair
   2    Poor
   1    Very Poor

OverallCond: Rates the overall condition of the house

   10   Very Excellent
   9    Excellent
   8    Very Good
   7    Good
   6    Above Average   
   5    Average
   4    Below Average   
   3    Fair
   2    Poor
   1    Very Poor
    

YearBuilt: Original construction date

YearRemodAdd: Remodel date (same as construction date if no remodeling or additions)

RoofStyle: Type of roof

   Flat Flat
   Gable    Gable
   Gambrel  Gabrel (Barn)
   Hip  Hip
   Mansard  Mansard
   Shed Shed
    

RoofMatl: Roof material

   ClyTile  Clay or Tile
   CompShg  Standard (Composite) Shingle
   Membran  Membrane
   Metal    Metal
   Roll Roll
   Tar&Grv  Gravel & Tar
   WdShake  Wood Shakes
   WdShngl  Wood Shingles
    

Exterior1st: Exterior covering on house

   AsbShng  Asbestos Shingles
   AsphShn  Asphalt Shingles
   BrkComm  Brick Common
   BrkFace  Brick Face
   CBlock   Cinder Block
   CemntBd  Cement Board
   HdBoard  Hard Board
   ImStucc  Imitation Stucco
   MetalSd  Metal Siding
   Other    Other
   Plywood  Plywood
   PreCast  PreCast 
   Stone    Stone
   Stucco   Stucco
   VinylSd  Vinyl Siding
   Wd Sdng  Wood Siding
   WdShing  Wood Shingles

Exterior2nd: Exterior covering on house (if more than one material)

   AsbShng  Asbestos Shingles
   AsphShn  Asphalt Shingles
   BrkComm  Brick Common
   BrkFace  Brick Face
   CBlock   Cinder Block
   CemntBd  Cement Board
   HdBoard  Hard Board
   ImStucc  Imitation Stucco
   MetalSd  Metal Siding
   Other    Other
   Plywood  Plywood
   PreCast  PreCast
   Stone    Stone
   Stucco   Stucco
   VinylSd  Vinyl Siding
   Wd Sdng  Wood Siding
   WdShing  Wood Shingles

MasVnrType: Masonry veneer type

   BrkCmn   Brick Common
   BrkFace  Brick Face
   CBlock   Cinder Block
   None None
   Stone    Stone

MasVnrArea: Masonry veneer area in square feet

ExterQual: Evaluates the quality of the material on the exterior

   Ex   Excellent
   Gd   Good
   TA   Average/Typical
   Fa   Fair
   Po   Poor
    

ExterCond: Evaluates the present condition of the material on the exterior

   Ex   Excellent
   Gd   Good
   TA   Average/Typical
   Fa   Fair
   Po   Poor
    

Foundation: Type of foundation

   BrkTil   Brick & Tile
   CBlock   Cinder Block
   PConc    Poured Contrete 
   Slab Slab
   Stone    Stone
   Wood Wood
    

BsmtQual: Evaluates the height of the basement

   Ex   Excellent (100+ inches) 
   Gd   Good (90-99 inches)
   TA   Typical (80-89 inches)
   Fa   Fair (70-79 inches)
   Po   Poor (<70 inches
   NA   No Basement
    

BsmtCond: Evaluates the general condition of the basement

   Ex   Excellent
   Gd   Good
   TA   Typical - slight dampness allowed
   Fa   Fair - dampness or some cracking or settling
   Po   Poor - Severe cracking, settling, or wetness
   NA   No Basement

BsmtExposure: Refers to walkout or garden level walls

   Gd   Good Exposure
   Av   Average Exposure (split levels or foyers typically score average or above)  
   Mn   Mimimum Exposure
   No   No Exposure
   NA   No Basement

BsmtFinType1: Rating of basement finished area

   GLQ  Good Living Quarters
   ALQ  Average Living Quarters
   BLQ  Below Average Living Quarters   
   Rec  Average Rec Room
   LwQ  Low Quality
   Unf  Unfinshed
   NA   No Basement
    

BsmtFinSF1: Type 1 finished square feet

BsmtFinType2: Rating of basement finished area (if multiple types)

   GLQ  Good Living Quarters
   ALQ  Average Living Quarters
   BLQ  Below Average Living Quarters   
   Rec  Average Rec Room
   LwQ  Low Quality
   Unf  Unfinshed
   NA   No Basement

BsmtFinSF2: Type 2 finished square feet

BsmtUnfSF: Unfinished square feet of basement area

TotalBsmtSF: Total square feet of basement area

Heating: Type of heating

   Floor    Floor Furnace
   GasA Gas forced warm air furnace
   GasW Gas hot water or steam heat
   Grav Gravity furnace 
   OthW Hot water or steam heat other than gas
   Wall Wall furnace
    

HeatingQC: Heating quality and condition

   Ex   Excellent
   Gd   Good
   TA   Average/Typical
   Fa   Fair
   Po   Poor
    

CentralAir: Central air conditioning

   N    No
   Y    Yes
    

Electrical: Electrical system

   SBrkr    Standard Circuit Breakers & Romex
   FuseA    Fuse Box over 60 AMP and all Romex wiring (Average) 
   FuseF    60 AMP Fuse Box and mostly Romex wiring (Fair)
   FuseP    60 AMP Fuse Box and mostly knob & tube wiring (poor)
   Mix  Mixed
    

1stFlrSF: First Floor square feet

2ndFlrSF: Second floor square feet

LowQualFinSF: Low quality finished square feet (all floors)

GrLivArea: Above grade (ground) living area square feet

BsmtFullBath: Basement full bathrooms

BsmtHalfBath: Basement half bathrooms

FullBath: Full bathrooms above grade

HalfBath: Half baths above grade

Bedroom: Bedrooms above grade (does NOT include basement bedrooms)

Kitchen: Kitchens above grade

KitchenQual: Kitchen quality

   Ex   Excellent
   Gd   Good
   TA   Typical/Average
   Fa   Fair
   Po   Poor
    

TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)

Functional: Home functionality (Assume typical unless deductions are warranted)

   Typ  Typical Functionality
   Min1 Minor Deductions 1
   Min2 Minor Deductions 2
   Mod  Moderate Deductions
   Maj1 Major Deductions 1
   Maj2 Major Deductions 2
   Sev  Severely Damaged
   Sal  Salvage only
    

Fireplaces: Number of fireplaces

FireplaceQu: Fireplace quality

   Ex   Excellent - Exceptional Masonry Fireplace
   Gd   Good - Masonry Fireplace in main level
   TA   Average - Prefabricated Fireplace in main living area or Masonry Fireplace in basement
   Fa   Fair - Prefabricated Fireplace in basement
   Po   Poor - Ben Franklin Stove
   NA   No Fireplace
    

GarageType: Garage location

   2Types   More than one type of garage
   Attchd   Attached to home
   Basment  Basement Garage
   BuiltIn  Built-In (Garage part of house - typically has room above garage)
   CarPort  Car Port
   Detchd   Detached from home
   NA   No Garage
    

GarageYrBlt: Year garage was built

GarageFinish: Interior finish of the garage

   Fin  Finished
   RFn  Rough Finished  
   Unf  Unfinished
   NA   No Garage
    

GarageCars: Size of garage in car capacity

GarageArea: Size of garage in square feet

GarageQual: Garage quality

   Ex   Excellent
   Gd   Good
   TA   Typical/Average
   Fa   Fair
   Po   Poor
   NA   No Garage
    

GarageCond: Garage condition

   Ex   Excellent
   Gd   Good
   TA   Typical/Average
   Fa   Fair
   Po   Poor
   NA   No Garage
    

PavedDrive: Paved driveway

   Y    Paved 
   P    Partial Pavement
   N    Dirt/Gravel
    

WoodDeckSF: Wood deck area in square feet

OpenPorchSF: Open porch area in square feet

EnclosedPorch: Enclosed porch area in square feet

3SsnPorch: Three season porch area in square feet

ScreenPorch: Screen porch area in square feet

PoolArea: Pool area in square feet

PoolQC: Pool quality

   Ex   Excellent
   Gd   Good
   TA   Average/Typical
   Fa   Fair
   NA   No Pool
    

Fence: Fence quality

   GdPrv    Good Privacy
   MnPrv    Minimum Privacy
   GdWo Good Wood
   MnWw Minimum Wood/Wire
   NA   No Fence

MiscFeature: Miscellaneous feature not covered in other categories

   Elev Elevator
   Gar2 2nd Garage (if not described in garage section)
   Othr Other
   Shed Shed (over 100 SF)
   TenC Tennis Court
   NA   None
    

MiscVal: $Value of miscellaneous feature

MoSold: Month Sold (MM)

YrSold: Year Sold (YYYY)

SaleType: Type of sale

   WD   Warranty Deed - Conventional
   CWD  Warranty Deed - Cash
   VWD  Warranty Deed - VA Loan
   New  Home just constructed and sold
   COD  Court Officer Deed/Estate
   Con  Contract 15% Down payment regular terms
   ConLw    Contract Low Down payment and low interest
   ConLI    Contract Low Interest
   ConLD    Contract Low Down
   Oth  Other
    

SaleCondition: Condition of sale

   Normal   Normal Sale
   Abnorml  Abnormal Sale -  trade, foreclosure, short sale
   AdjLand  Adjoining Land Purchase
   Alloca   Allocation - two linked properties with separate deeds, typically condo with a garage unit  
   Family   Sale between family members
   Partial  Home was not completed when last assessed (associated with New Homes)